Initial document: Paul F. Johnson, 26 Feb. 2004, Version: 0.04
Revised: Craig Ringer, 16 Aug. 2005, Version: 0.05 (text encodings, exceptions, who to send patches to)
Revised: Peter Linnell, 06 Sept. 2005, Version: 0.06 (more text encodings stuff from recent discussions)
This document is intended as a guide to all Scribus developers and contributors. It is aimed to be simple to use, as well as to keep the source code simple to understand and maintain.
All code should be documented clearly.
Prior to any method, the following should be there
/*! \fn functionName::functionName() \author Authors Name \date \brief Constructor for class \param None \retval None */
There should be something similar prior to the class definitions.
In preference, all details should be in English. This includes comments.
Unless a comment occupies a single line, //
should be used in preference to /* */
All variables should begin in lower case letters. If the variable has more than one part to it, then the second part should begin in a capital letter. For example int foo
and int fooBar
. Under no circumstances should an underscore (_) be used.
Variables should be clear in their naming. The exception to this is in a for loop. In preference, these should be in English. We are in the process of converting and altering the current ones to fit in with this.
Any temporary variables (such as those used in a for
loop or those only used within a scope), should only be declared where they are needed and not used in a global scope. If there is a condition at the start of a method, any variables required should be declared, but nothing else. For example
void foo::bar(f *b) { int j; string g = b->passed_in; double fg, m, c; which_string(g, &j); if (j == true) return; // rest of the code }
should not have the double variables declared. It is arguable if g should be declared, but in this example, it doesn't matter.
If a variable is not required, don't create it. In the example above, there is no requirement for g to be used.
In class definitions
With the exception of the access declarations, a single tab should be used prior to all method and variable definitions. There should be no tabs prior to access declarations. There is a tab prior to Q_Object
.
In the main code
There should be a tab at the start of every line. With any conditional line, there should be an opening brace by itself. On the next line down, there should be another tab prior to any code. The close brace should be underneath the open brace.
Always make a conditional logical. This may sound silly, but look at the following example
int foobar(bool m) { if (m == true) { somethingA(1); somethingB(1); somethingC(1); } else { somethingA(0); somethingB(0); somethingC(0); } }
This can be re-written to make things clearer
int foobar(bool m) { int c = m == true ? 1 : 0; somethingA(c); somethingB(c); somethingC(c); }
A lot simpler to see and will usually optimize far better.
Ternary conditionals (?) should be used in preference to if
/ else
. These are usually far quicker to process the code once compiled.
Depending on what is being performed, switch statements should be avoided. There is usually a simpler way to avoid them. For example
int b; string s; switch (f) { case 1 : b = 4; s = "four"; break; case 2 : b = 8; s = "eight"; break; case 3 : b = 12; s = "twelve"; }
can be re-written as
int b = f * 4; char *number[] = {"four", "eight", "twelve"}; size_t no = sizeof(number) / sizeof(*number); s = number(f);
The switch has gone and the code generated runs far faster and as the conditions don't need to be created, will optimize far better. Note: when using the QObject::tr()
function, this method of using char* may not always work properly. An alternative is to use QString number[] = {"four", "eight", "twelve"};
etc.
Obviously for more complex switch conditions, the above can't be used.
Formatting should look like this:
if (fred == 5) { // something // something }
Switch formatting should look like this:
switch (f) { case 0: if (fred == 1) { // something // more // ok } else { if (jimmy == 0) { // hi // code } // something here } break; }
Simple to read and simple to see where the conditional ends.
Under NO circumstances should you have a line such as:
if (foo == bar) { somethingA(0); }
This should be written as:
if (foo == bar) somethingA(0);
If braces are not required, don't use them. For example, braces wouldn't be needed here
for (int j = 0; j < 10; ++j) cout << j << "\n"?;
In for loops, preincrements should be used (++a rather than a++). Under ARM processors and x64 processors (not sure about ia64) ++a requires half the number of clock cycles than the post processor style. In some circumstances, ++a is far more efficient under ia32. It is even more efficient when used with iterators. Post increments should be used in the following sort of code:
if (b == true) { m->wibble(c); m->wobble(c++); }
There should be a space after:
if (a == true)
- there is a space either side of the ==
conditional)Prior to an open bracket or after a close bracket:
Templates for both types and classes are fine to be used. The project recommends using g++ 3.x which fully support templates and template classes. Please remember the overhead involved in using templates!
When using templates, please remember the following:
Unless you really know what you're doing and more over, what the code is doing, do not attempt to do any code optimization. In a lot of respects, optimization is a "black art".
Code such as a = b = c = 0;
is more efficient (through the compiler) than a=0; b=0; c=0;
Careful thought about how the code works is usually far more beneficial than attempting any of the "tricks" currently employed.
Any large scale changes should be approved of first. Sometimes a small change can have large side effects.
Multiple separate lines containing calls to Qt methods are far less efficient than multiple calls contained in a for loop.
QString fred = QString("bob")
is not the same as QString fred("bob")
. The first one creates an additional temporary that's then thrown away. That's slow, and should generally be avoided.
In constructors, it is somewhat faster to use:
Fred::Fred() : ParentClass(), argument1("initialValue"), argument2(INITIAL_VALUE) { }
than
Fred::Fred() : ParentClass() { argument1 = "initialvalue"; argument2 = INITIAL_VALUE; }
for much the same reasons as outlined above. It's best to initialize the members in the same order that they're listed in the class declaration.
Due to the unavailability of exceptions (see below), we're using Q_CHECK_PTR
to check allocations. This means that you get an abort with a useful message when an allocation fails, rather than an unexplained segfault.
int *m = new int[2048]; Q_CHECK_PTR(m);
All calls to new
should ideally be trapped. This is not currently implemented in current releases. The preferred method would be:
#include <new> try { m = new int[2048]; } catch(bad_exception) { cout << "memory allocation to m failed - <method name<" << endl; exit(EXIT_FAILURE); }
As there are many calls in Scribus to new, these will have a wrapper around.
For users of Qt who compile their own, they will need to include in the ./configure
script that they want to throw C++ exceptions. Failure to do so will result in the catches not catching and possibly Scribus failing to run.
Currently (for v3.2 - v3.4), KDE recommend Qt is compiled without exceptions. This is a bad idea as it means trapping bad memory allocations cannot be caught using the preferred method. As most distributions follow this recommendation, that means we (Scribus) can't expect to be able to rely on exceptions.
Text encodings need to be handled carefully. Strange errors can result if you treat UTF-8 text as latin-1, for example. These issues are often hard to track down, so careful coding is important. A detailed treatment of text encodings won't fit here, but some information will be provided elsewhere in the documentation. This is an early version of this part of the text; feedback would be appreciated.
The first thing you need to understand is that Qt's QString
, and most other Qt classes, know their own encoding and take care of text encoding issues for you, so they can be treated as "just text" when passing them around inside Scribus and to other Qt classes. It's only at the interfaces between byte-oriented and "text"-oriented code that care is required. These interfaces can be tricky to spot (the implicit conversion of a QString
to a QCString
or char*
, for example, is easily missed) so caution and careful reading of API documentation is required.
While there is no real substitute for understanding text encodings, some of the most important rules to follow are:
Always construct a QString
from a byte string using one of the QString::fromblah(...)
methods. You should never use QString(char*)
or QString(QCString)
. You need to determine what encoding the byte string you're working with is in and use the correct call, by examining the documentation for the system / library you are calling. There is generally no way to tell what encoding a string is in by examining the string - at best this is guesswork. Do not assume a string is utf-8 or latin-1, since even if it works for you, in some locales it will not. For data read from the OS or C libraries, you generally want QString::fromLocal8Bit(...)
. If in doubt, ask somebody.
Similarly, you should always use the appropriate QString
conversion methods when converting QString
s to byte strings (e.g. QCString
, QByteArray
, or char*
). Note that such conversions may happen automatically, so watch out for implicit conversions. If sending data to system calls or C libraries you'll generally want QString::local8Bit()
but you should check to confirm that's correct for the specific call you're making. Some win32 calls, for example, expect UCS-2 (wchar) data, and some C libraries will expect UTF-8 strings to be passed to some methods.
You should absolutely never do this:
QString blah = anotherQString.utf8();
The above code:
QCString
; thenQString(QCString)
constructor for blah;QString(QCString)
decodes the passed QCString
according to the default byte string locale set in qt which is usually latin-1 (don't rely on that; it should change to depend on locale at some point.).Thus, you end up doing a lot of pointless work encoding and decoding the string, and to top it off probably mangle it by decoding a utf-8 string as latin-1. You usually won't notice this, because you probably only send ASCII data thorough, and thus the bug will only turn up much later from people in other locales who use other text encodings. These issues tend to be a pain to track down.
For similar reasons, when you're reading or writing a QTextStream
, remember to call QTextStream::setEncoding(...)
before reading/writing any data. Use operator<<
with QString
s, but WriteRawBytes(...)
when writing byte strings that are already in the correct encoding. Byte strings in other encodings should be converted to QString
or transcoded using QTextCodec
. Do not just assume the QTextStream
should write out UTF-8. The correct encoding will depend on where the data is going / coming from. For example, a plain text file for the user to edit should be in their local text encoding. If you're not sure, ask someone.
Don't convert a QString to a byte string when writing to a QTextStream - the QTextStream does that for you in operator<< . For example:
QTextStream ts(&file); // In this case we know we want to write out UTF-8 to this file, not write in // the local 8-bit byte string encoding. ts.setEncoding(QTextStream::UnicodeUTF8); ts << someQString.utf8();
as that's the same error as in the previous example, just written a different way. Instead, you should omit the .utf8()
.
The best way to handle text encodings is to convert 8-bit byte strings to QString
or another "safe" class as early as possible in input, and convert to 8-bit byte strings as late as possible on output. Above all else, do it only once and watch out for implicit conversions. You should think about encodings whenever passing text across any interface that wants byte-oriented strings rather than QString
s. Remember to read the documentation of the APIs you're using to determine how they behave with text encodings, and if you're not sure, ask somebody.
It is strongly recommended that you read http://doc.trolltech.com/3.3/i18n.html for more coverage of this issue, especially the section "Support for Encodings". You should also read Achtung! Binary and Character data.
Please use:
"<qt>" + tr("text for translation") + "</qt>"
not this:
tr("<qt>text for translation</qt>")
it's really annoying to translators.
If you are writting class which is not inherited from QObject or any Qt class, you cannot use inherited tr() method. If you are using something like:
QObject::tr("text for translation")
replace it with:
QObject::tr("text for translation", "describe what is it related")
An example from the scripter:
QObject::tr("Item is not text frame", "scripter error")
It helps translators with context and it tells them what they are translating too, though in some cases strings are generic in nature.
There's another issue related to translations, though it's more for encoding correctness etc: Please don't use:
tr( QString("blah %1").arg(something) )or otherwise use a QString inside a tr(...) . Doing so causes an implicit conversion to `const char*', in which the QString converts to a latin-1 byte string. If some characters cannot be represented in latin-1 they're replaced with `?' . The correct usage is:
tr("blah %1","context").arg("something")
... exploiting the fact that tr(...) returns a QString and that you can chain calls to QString::arg. That avoids any implicit conversions, and will avoid a whole class of weird encoding bugs.
If you really have to translate text that can't be represented in latin-1, you need to use QObject::trUtf8(...). If you want to translate a string with "Δ" (uppercase delta) in it, for example, we could try:
tr("Δ is %1", "context")
but that's wrong, not portable, may not compile, or worse might have different results on different systems. We must use escaped byte strings rather than including a unicode or extended latin-1 char directly in source. The UTF-8 byte sequence for that character is \xce\x94, so we could try:
tr("\xce\x94 is %1", "context")
but that's also wrong, as that unicode byte sequence will be interpreted as latin-1 and you'll get a mangled result. You must instead use:
trUtf8("\xce\x94 is %1", "context")
Similarly, if you wanted to include the © (copyright) symbol, which exists in latin-1 as well, you could use either:
tr("something \xa9 someone", "context")
or
trUtf8("something \xc2\xa9 someone", "context")
Don't cast a pointer to an int, it causes portability issues where sizeof(int) != sizeof(void*). You might want to find another way to do what you're trying to achieve.
Code patches should be sent to Franz, Craig Bradney, or Paul. Documentation patches should be sent to Peter and Craig Bradney. This is subject to change. Scripting matters should be addressed to Petr and Craig Ringer.
Patches should be in plain text. diff -u old_file.cpp new_file.cpp >difference.diff
is the best way of sending them. Please do not send patches to the main list. Remember to use -u
(unified diff) as other types of diff are painful to use.
Please ensure that when you have changed something you put a comment immediately before the change. Ensure you put a date and who made the change and what the change is for. Make sure that the comments include the original line of code. Over time, these will be weeded out, but it useful to see what has been altered.
Send a single diff for each file altered. It makes committing the changes far more simple for us!
Please use the current functions and refer to the Plugin howto for further information.
All code called from the main Scribus application has to be encompassed within an extern "C" { ... };
to enable the plugin to work.
Ensure that all the code is documented and follows the coding guidelines set out above.